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Abstract The aim of this paper is to present the convergence analysis of 
a very general class of gradient projection methods for smooth, constrained, 
possibly nonconvex, optimization. The key features of these methods are the 
Armijo linesearch along a suitable descent direction and the non Euclidean 
metric employed to compute the gradient projection. We develop a very general 
framework from the point of view of block-coordinate descent methods, which 
are useful when the constraints are separable. 
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1 Introduction 

This paper deals with the problem 

min f(x), (1) 

xdz£2 

where fl C R” is a closed and convex set and / is a continuously differentiable 
function. The aim of this work is to generalize the class of gradient projection 
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methods whose basic iteration is given by 

x (k+i) = x (k) + A (fe) (y (fc) _ x (k)^ (2) 

where y^ is the Euclidean projection of — cr^V f(x^) onto 17, i.e. 
y (fe) = P n (xM - <7 fc V/( x^)) = arg min \\x - x (fc) + er fc V/(®W)|| 2 , (3) 

xEi? 

and (Jfc > 0, A^ £ (0,1] control the steplength. 

Iteration (2)-(3) is also referred as gradient projection method with linesearch 
along the descent direction [5,18], which depends on two parameters A ( k \a k . 
Usually, in iteration (2), AW is adaptively computed to ensure the sufficient 
decrease of the objective function and, thus, the convergence of the whole 
scheme, while a k is a ’free’ parameter which can be chosen in order to improve 
the effectiveness of the algorithm (see e.g. [4,12,13,15]). 

In our analysis, we extend the convergence results about the gradient projec¬ 
tion method (2)-(3) to the more general case where y^ is defined as 

= argmin h^k) (x, x^) (4) 

x£f2 


and h^ is a suitable strictly convex function depending on the array of param¬ 
eters er £ R 9 . The choice of ha can be addressed by taking into account some 
features of problem (1). For example, ha<.k)(x,x^) may represent a local ap¬ 
proximation of / at x^ k \ or may play the role of barrier for a given constraint 
set, forcing the iterates to stay in the interior of it [1—3]. 

In particular, we present our results in the more general framework of the 
block-coordinate methods, which are useful when the constraint set in (1) has 
a separable structure, i.e. 17 = fi\ x ... 12 m , with Sli C IR ni , Y^iLi n i = n > so 
that any x £ 17 can be block partitioned as x = (xf ,..., x ^ n ) T , a \ 

Such methods are based on the idea of performing successive minimizations 
over each block, as in the classical nonlinear Gauss-Seidel method [5]: 


(fe+i) . ,/ (fc+i) 

x\ £ arg mm J(x\ , 

a;£J?£ 


, X 


(fe+i) 
i-1 ■ 


' Mj '> )' 


( 5 ) 


However, the convergence of this approach is not ensured without quite re¬ 
strictive convexity assumptions (see [5,17]) and, in addition, computing an 
exact minimum of /, even if restricted to a single block, can be impractical. 
On the other side, inspired by the idea of (5), effective methods able to handle 
general nonconvex problems and with global convergence properties can be 
designed [7,9,16]. 

In this paper we further develop the cyclic block gradient projection method 
proposed in [7], allowing generalized projections based on non Euclidean dis¬ 
tances. In particular, we propose a method consisting in applying a finite 
number of iterations of the form (2)-(4) to each subproblem of type (5) and 
we show that any limit point of the generated sequence is stationary without 
any convexity assumption. Our general framework includes, but it is not lim¬ 
ited to, several state-of-the-art methods, such as the scaled gradient projection 
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method [8], the spectral projected gradient method [6], the cyclic block gradi¬ 
ent projection method [7] and the successive convex approximation algorithm 


[19]- 


The paper is organized as follows: in section 2 we devise the property of the 
operator h& in (4) which allow to reformulate the stationarity condition of 
(1) by means of a class of generalized projection operators. We also show that 
they can be used to design families of descent directions. Building on this ma¬ 
terial and on the well known properties of the Armijo linesearch procedure, in 
section 3 we define a block-coordinate generalized gradient projection method 
and we develop the related convergence analysis. Our conclusions are given in 
section 4. 


2 Generalized gradient projections 

In this section we give the definition of a generalized projection operator, 
providing some examples of well-known functions belonging to this category. 

Definition 1 Let S C It We define a metric function associated to / any 
continuously differentiable function : 17 x 17 —> R such that for any choice 
of the parameter cr £ S the following properties hold: 

(HI) hfj is convex with respect to its first argument, i.e. 

h„{y, z ) > h a {x,z) + Vi/i CT (a;,z) T (y - x) Vx,y,z£f2 (6) 

and, for any z £ 17, z) admits a unique minimum point; 

(H2) for any point x £ 17 and for any feasible direction d £ R” we have 


Vi h rT (x,x) T d = X7f(x) T d ; 


(7) 


(H3) continuously depends on the parameter er. 

We denote by %{f, 17, S) the set of the metric functions satisfying properties 
(H1)-(H3) and, for any h CT £ 7i(f, 17, S), we define the associated generalized 
gradient projection operator p( ■ ; h CT ) : 17 —» 17 as 


p(x- : ha-) = argmin h CT (z, x) \/x£f}. 


( 8 ) 


Example 1 Properties (6)-(7) are satisfied when the function ha- is defined as 


ha{x, y) = Vf(y) T (x - y) + d rT (x, y), 


(9) 


where € 2?(17). In these settings we can find: 

a) the standard Euclidean projection p(x; h a ) = Pq(x — crV f(x)), obtained 
by choosing 



(10) 
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b) the scaled Euclidean projection, considered for example in [6,8], corre¬ 
sponding to the choice 



( 11 ) 


In this case the array of parameters er is given by the pair ( a,D ), where 
a £ R>o and D £ R nxra is a symmetric positive definite matrix; 
c) the Bregman distance associated to a strictly convex function b : fi —» R, 
which is defined as 


d a (x,y) = ~(b(x) - b{y) - S7b(y) T {x - y)), a > 0. (12) 

<7 


Example 2 If / is convex, a further class of functions satisfying the properties 
of Definition 1 is given by 


h<r{x, y) = f{x) + d<r(x, y), 


(13) 


where again £ 27(17). If d CT is chosen as in (10), the resulting leads to 
the so-called resolvent (or proximity) operator associated to / (see e.g. [10,11] 
or [14] for the general case). 

Example 3 Consider the case when / = /o + /i, where /o, /i : 17 — > R and /o 
is convex. Then, the function defined as 

ha{x,y) = f 0 (x) + d C r(x,y) + Vf 1 {y) T (x-y) Vx,y £ 12, (14) 

with da- £ T>(f2), belongs to %(f, f2, S). If reduces to (10), the correspond¬ 
ing projection operator is also known as the proximal gradient operator , which 
is employed to define forward-backward splitting algorithms for convex opti¬ 
mization [11,19]. 

Observe that the metric functions defined in (13)-(14) are majorant of the 
objective function, that is ha{x,y) > f(x) for all x,y £ fi. Further, any 
convex upper bound function in the sense of [19, Assumption 1] admitting a 
unique minimum point clearly satisfies the premises of Definition 1. 

Remark. For sake of simplicity, in Definition 1 we assume ha and / to be 
smooth functions, but this could be relaxed, requiring only the existence of 
directional derivatives. Indeed, properties (6) and (7), as well as the analysis 
carried out in the rest of this section, could be reformed in terms of directional 
derivatives. 

In general, any function h a £ %(/, 17, S) can be exploited to define a descent 
direction for problem (1), as stated in the following proposition. 

Proposition 1 Let x £ fi, cr £ S C R 9 , £ %(/, fi , S) and 


y = p(x;ha). 


(15) 


Then we have that 


V/(a;) T (y - x) < 0 

and the equality holds if and only if y = x. 


(16) 
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Proof. Inequality (6) with z = x yields 

Vi ha{x, x) T (y -x)< h rT (y , x) - h CT (x 1 x) < 0, 

where the rightmost inequality follows from (8) and, since the minimum point 
of ha-(-,x) is unique, the equality holds if and only if x = y. Then, the thesis 
follows recalling (7). □ 

In the following proposition, we show that the stationary points of (1) can 
be characterized as fixed points of the generalized projection operator (8). 

Proposition 2 Let S C R 9 , a G S and ha- G %{/, ft, S). A point x G ft is a 
stationary point for problem (1) if and only if x = p(x; ha). 


Proof. Assume that for a point x* G ft the following equality holds: 

x* = argmin h„(x, a;*). 

Then, the stationarity of x* yields 

X7iha(x*, x*) T (x — x*) > 0 MxGft. 

Since by assumption (7) we have Viha(x*, x*) T (x — x*) = V f(x*) T (x — x*), 
it follows that x* is a stationary point for problem (1). 

Conversely, let x* G ft be a stationary point of (1) and define 

x = arg min ha{x , x*). 

x 

Assume by contradiction that x* ^ x. Then, combining (6) with x = z = x *, 
y = x and (7) we obtain 

V f{x*) T (x — X*) < ha(x , X*) — ha(x*,X*) < 0, 

where the last inequality follows from the fact that x is the unique minimum 
point of ha (*, x*) and x* ^ x. This contradicts the stationarity assumption 
on**. □ 


3 Cyclic block generalized gradient projection method 

In this section we consider problem (1) where the constraint set has the fol¬ 
lowing separable structure 

m 

ft = fl\ x ... ft m , fti C R rai , = n (17) 

i—l 

so that any x G ft can be block partitioned as x = (xf, ..., x^f) T , Xi G R ni . 
The key ingredients of our approach are the sufficient decrease of the objective 
function enforced by a block version of the well known Armijo backtracking 
procedure and a suitable metric function h a G T~L(f,ft,S) defined so that is 
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Algorithm 1 Armijo linesearch algorithm 

Let {z( fe )} fceN be a sequence of points in Q and a sequence of descent directions, 

(k) 

for a given i E {1, Fix 5i,/3 E (0,1) and compute ' as follows: 


1. Set 4 fe) 

2. If 


= f: 


+ \\ k) d[ k \ < f(zW) + 


(18) 


Then go to step 3. 

Else set = <5^ Aand go to step 2. 
3. End 


separable with respect to the partition in (17). 

Then, we first recall in Algorithm 1 the block version of the Armijo linesearch 
method. In the following proposition we give conditions which guarantee that 
Algorithm 1 is well defined. Its proof can be derived from known results (see 
[5,17]). 

Proposition 3 Let { 2 (fc) } fceN be a sequence of points in Q. Assume that z ^ 
converges to some z and for i £ {l,...,m} let {d[ fc ' ) }fc e N be a sequence of 
feasible directions such that 

(Al) there exists a number M > 0 such that ||d^ | < M for all k £ N; 

(A2) we have S7if(z^) T d[ k ^ < 0 for all k £ N; 

(A3) we have lim f(z^) — f(z ( {\ ..., z^ + A d[ k \z$) = 0, where A 

k—too 

is computed with Algorithm 1. 

Then, for each k £ N the LS procedure terminates in a finite number of steps 
and, furthermore, lim^oo V/ (z^) T d^ = 0. 

In order to formally introduce the method and perform its convergence anal¬ 
ysis, we choose the metric function £ 'H(f, 12, S), where S = Si x ... x S m , 
Si C such that the parameter <x can be partitioned as cr = ( <j \,..., cr m ). 
Moreover, we define h CT so that it is separable over the m blocks with respect 
to its first variable, i.e. 


h* (x,y) = J2 h l i (x i ,y), ( 19 ) 

i=l 

where the functions h q a .. : f2i x 17 —> R satisfy the following conditions: 

(BH1) h^. is convex with respect to its first argument and admits a unique min¬ 
imum point; 

(BH2) for any point x £ f2 and for any vector d £ R ni such that Xi + d £ 1?^ we 
have 

Vih l rTi {x i ,x) T d = X7jf(x) T d, ( 20 ) 

where V, f{x) denotes the gradient of / with respect to the i-th block of 
variables; 
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(BH3) h 1 ^. continuously depends on the parameter <Xj £ . 

It is easy to see that the metric function defined in (19), thanks to the as¬ 
sumptions (BH1)-(BH3), belongs to TL{f, 17, S) and the associated generalized 
gradient projection can be also partitioned by blocks as 


p(x; ha-) 




where Pi(x; h l a .) = arg min h l „ (zt, x). (21) 


Lemma 1 Let x £ 1? and cr £ S C M 9 . Then, 

(i) x is stationary for problem (1) if and only if p i (x\ .) = Xi Vi = 1, ... ,m; 
(ii) V i f(x) T (p i (x\ h l er .) — xf) < 0 Vi = 1,... ,m and the equality holds if and 
only if Xi = p^x; h^.). 

Part (i) of the previous Lemma directly follows from (21) and from Proposi¬ 
tion 2, while part (ii) can be easily proved by employing the same arguments 
as in the proof of Proposition 1. 


Algorithm 2 Cyclic Block Generalized Gradient Projection Method 

Define a compact set S and a metric function her G 'H(f, C, S) as in (19). Choose /3, 5 E (0,1). 
Choose £c(°) E 17 and the upper bounds for the inner iterations numbers Li,..., L m . 

For k = 0,1, 2,... 


1 Set z(k, 0) = £cl fc > 

2 For i = l, ...,m 


2.1 

2.2 

2.3 


c , (fc,o) (fc) 

Set x ' = x- 

I l 

(k) 

Choose the inner iterations number L\ < L{ 

For 1 = 0, -,L {k) - 1 

2.3.0 Set = (x[ k+1 \...,x[ k _+ 1 \x ( i k ’ e) ,x 

2.3.1 Choose the parameter cr) k ’^ £ Si 

2.3.2 Compute the descent direction 


(*) 

i+l’ 




) 


d i k,t) =Pi(x ( - k,t ' ) \h l -x\ 


(M) 


2.3.3 


and set d^ k ’^ = (0,..., 0, d) k ’ C \ 0,..., 0) 

(k £) 

Compute with Algorithm 1 the Armijo steplength A) ’ ’ such that 




/(*<*'<> + A< M) d (M) 

2.3.4 

Set x 

(M+1) = x (k,t) + x (k,e) 

End 

= x i 

2.4 Set 

4 k+1) 

2.5 Set 

z(k,i) 

= (x\ k+1 \...,x^ k+1 \x' 

VI 1 1 1 1 

End 




,( fc h 


3 Set = z(k , m) 

End 
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The previous results can be exploited to design a cyclic block generalized gra¬ 
dient projection (CBGGP) method, whose steps are outlined in Algorithm 2. 
Before to analyze the convergence properties of this approach, we observe that 
it is a descent method and, in particular, the objective function is nondecreas¬ 
ing over the partial updates z(k, i), i = 0,to, k = 1, 2,... defined at step 2.5. 
Indeed, the following inequalities hold 

f(z(k, i + 1)) < f{z(k,i)) +/3A^° ) V l+ i/(z(fe,f)) T d^° ) < f(z(k,i)) 

which also implies 


f{z{k + 1,0)) = f{z{k, to)) < f{z(k, i + 1)) (22) 

< f{z(k , i)) < f(z(k, 0)) = f(z(k - 1, to)). 


We are now ready to give the first result about Algorithm 2. 

Proposition 4 Let {x^}k^N be the sequence generated by Algorithm 2. Sup¬ 
pose that for some i £ {0,..., m} the sequence {z(fc, f)}fc£N admits a limit point 
z. Then p l+1 (z; h l +^ +1 ) = z i+1 Ver i+1 <E S i+1 if i < m, while p^z;/^) = zi 
V(T! £ Si if i = to. 

Proof. Suppose first that i < to. From Lemma 1, we only need to show that 
there exists <r i+ i £ S'i+i such that equality p i +i( z ; /l S+ 1 ) = Zi+i holds. 

Assume by contradiction that p i+1 (z; /ij+*) ^ z i+ i for all <r i+1 £ 5, + i. Let 
K be the set of indices such that {z(k, i)}keK converges to z and 
converges to some <r i+ 1 £ Si + 1 . If ||p i+ 1 (z; — Zj+i|| = 2e > 0, the conti¬ 

nuity of the generalized projection operator with respect to all its arguments 
guarantees that, for k £ K being sufficiently large, we have 

IK ( +’i O) ||>e>0, 

where eZ^’O = p i+1 (z(k,i)-,h l+ ( l^) — (see also Step 2.3.2 of Algorithm 

CT i+’i 

2). Then, by applying Lemma 1 (ii) we have 

V l+1 /(z(M)) T 4+i 0) <- 77 < 0 , (23) 

where 77 is some positive scalar. 

On the other side, inequalities (22) guarantee that, for all i, we have lim^oo f(z(k , i)) 
f(z ), thus we obtain that 


lim f(z(k,i)) - f(x[ k+1) 

K—t OO 




(k) 

c i+i 


. (fe,0) ,(fe,0) 

'*+1 a i+1 > 


,X 


(O) = 0 

m J 


Moreover, since {z(k,i)}keK is a convergent sequence, it is also bounded. 
Therefore the sequence {d[^}k^x is bounded and Proposition 3 implies that 


lim V l+ i/(z(fc,f)) T d^- 1 0) = 0, 

k—too,k€K 
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which contradicts (23). 

The same arguments can be applied also when i = m, since z{k, m) = z(k + 
1 , 0 ). □ 
The previous proposition is crucial for proving the main convergence result 
for Algorithm 2, given below. 

Theorem 1 Let be the sequence generated by Algorithm 2 and as¬ 

sume that x is a limit point of {aj^jkgpj. Then x is a limit point also for the 
sequences {z(k, i)}fcgN for any i = 1,..., m — 1 and it is a stationary point for 
problem (1). 

Proof. The proof runs by induction on the block index i and on the inner 
iteration number £ and it is similar to that of Theorem 4.2 in [7]. Since x 
is a limit point for {a;( fc )}fc 6 N = {z(k,0)}ken, from Proposition 4 it follows 
that, denoting by K 0 a set of indices such that {x^}keK 0 converges to x 
and {cr[ k ' 0 ^}k£K 0 converges to some d-j G Si, we have p 1 (x-,hf o) = Xi and 

linife-^oo.fceifn ||di fc ’ 0) || = 0. 

From Step 2.3.4 of Algorithm 2, it follows that lim k^oo,keK 0 — ai^H = 

0, i.e., Xi is a limit point also for the sequence 
Introducing a subset of indices Ki C K 0 such that the sequence 
converges to Si and converges to some <r\, we have 


lim d^' 1 - 1 

k—too,k£Ki 


lim P 1 ((*i fc,1) ,*2 fc) , 

k—too,k£Ki 

- xi = 0 , 


Ak) 



) 


X 


(M) 

1 


where the second equality follows from the continuity of the generalized pro¬ 
jection operator and the third one is a consequence of Proposition 4. 

Using the same arguments, by induction on l we can conclude that, for each i = 

(k 

0,..., L\ —1, there exists a suitable subset of indices Ki such that Ymik^oo,d\ 
0 and we obtain 


. (fc) 


x 


(fc+i) 


— X 


(k) , 


<Eriri<E A i 


k-iooJceKi 


>0, 


e=o 


e=o 


where K\ = n^_I 0 1 Ki. Thus, the point a; is a limit point also for the sequence 
{;z(fc, l)}fceN = {(xi fe+1 \ x£\ —, and Proposition 4 ensures that 

p 2 (x; hf 0 ) = x 2 for some cr% G S 2 . 

Proceeding by induction on i and employing the same arguments used for 
i = 1, we prove that x is a limit point of the sequences {z(k, *)}/cgN for 
any i = 1,..., m — 1. As a result of this, invoking again Proposition 4, we can 
conclude that for any i = 1, ...,m there exist <Ji G S t such that Pjfx; h l a . ) = a^. 
Therefore, by Lemma 1 (i) we can conclude that a; is a stationary point of 
problem (1). □ 
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4 Conclusions 

In this paper we address the general problem of the constrained minimization 
of a differentiable function in which the unknown can be partitioned in blocks, 
each with a convex and closed feasible set. In order to address this problem, 
we considered block coordinate first order methods exploiting suitable descent 
directions based on very general projection operators. In particular, we in¬ 
troduce a class of generalized projection operators based on non Euclidean 
metrics, which includes as special cases Bregman projections, proximity and 
proximal gradient operators. Our approach combines the properties of these 
generalized projections with those of the Armijo linesearch strategy to obtain 
a generalized gradient descent method able to produce a sequence of iterates 
whose limit points are stationary. 

Future work will include a generalization of these results to nonsmooth ob¬ 
jective functions, the analysis of suitable strategies to design the parameters 
defining the metric functions and the extensive application of the proposed 
optimization approaches in real-world problems in astronomy and microscopy. 
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